Temporal Data Mining of Scientific Data Provenance

نویسندگان

  • Peng Chen
  • Beth Plale
  • Mehmet Aktas
چکیده

Provenance of digital scientific data is an important piece of the metadata of a data object. It can however grow voluminous quickly because the granularity level of capture can be high. It can also be quite feature rich. We propose a representation of the provenance data based on logical time that reduces the feature space. Creating time and frequency domain representations of the provenance, we apply clustering, classification and association rule mining to the abstract representations to determine the usefulness of the temporal representation. We evaluate the temporal representation using an existing 10 GB database of provenance captured from a range of scientific workflows.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Big Data Provenance: State-Of-The-Art Analysis and Emerging Research Challenges

This paper focuses the attention on big data provenance issues, and provides a comprehensive survey on state-of-theart analysis and emerging research challenges in this scientific field. Big data provenance is actually one of the most relevant problem in big data research, as confirmed by the great deal of attention devoted to this topic by larger and larger database and data mining research co...

متن کامل

From Scientific Workflow Patterns to 5-star Linked Open Data

Scientific Workflow management systems have been largely adopted by data-intensive science communities. Many efforts have been dedicated to the representation and exploitation of provenance to improve reproducibility in data-intensive sciences. However, few works address the mining of provenance graphs to annotate the produced data with domain-specific context for better interpretation and shar...

متن کامل

Provenance for Data Mining

Data mining aims at extracting useful information from large datasets. Most data mining approaches reduce the input data to produce a smaller output summarizing the mining result. While the purpose of data mining (extracting information) necessitates this reduction in size, the loss of information it entails can be problematic. Specifically, the results of data mining may be more confusing than...

متن کامل

Towards Low Overhead Provenance Tracking in Near Real-Time Stream Filtering

Data streams flowing from the physical environment are as unpredictable as the environment itself. Radars go down, long haul networks drop packets, and readings are corrupted on the wire. Yet the data driven scientific models and data mining algorithms do not necessarily account for the inaccuracies when assimilating the data. Low overhead provenance collection partially solves this problem. We...

متن کامل

Provenance, Tectonic Setting & Geochemical Maturity of The Early Miocene Pyawbwe Formation, Sakangyi –Thayet Area, Magway Region, Myanmar.

Abstract The best exposed Early Miocene (820 m. thick. ) shales and interbedded silty sandstones beds of the Pyawbwe Formation at Sakangyi- Thayat area,Magway Region are investigated geochemically by using Siemens SRS- X Ray 303 AS XRF Spectrometer. Major and some trace element concentrations have been determined to achieve their provenance, tectonic setting ,paleoweathering , paleoclimate and ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012